Handling high-dimensional data in air pollution forecasting tasks

نویسندگان

  • Diana Domanska
  • Szymon Lukasik
چکیده

In the paper methods aimed at handling high-dimensional weather forecasts data used to predict the concentrations of PM10, PM2.5, SO2, NO, CO and O3 are being proposed. The procedure employed to predict pollution normally requires historical data samples for a large number of points in time – particularly weather forecast data, actual weather data and pollution data. Likewise, it typically involves using numerous features related to atmospheric conditions. Consequently the analysis of such datasets to generate accurate forecasts becomes very cumbersome task. The paper examines a variety of unsupervised dimensionality reduction methods aimed at obtaining compact yet informative set of features. As an alternative, approach using fractional distances for data analysis tasks is being considered as well. Both strategies were evaluated on real-world data obtained from the Institute of Meteorology and Water Management in Katowice (Poland), with extended Air Pollution Forecast Model (e-APFM) being used as underlying prediction tool. It was found that employing fractional distance as a dissimilarity measure ensures the best accuracy of forecasting. Satisfactory results can be also obtained with Isomap, Landmark Isomap and Factor Analysis as dimensionality reduction techniques. These methods can be also used to formulate universal mapping, ready-to-use for data gathered at different geographical areas. Email addresses: [email protected] (Diana Domańska), [email protected] (Szymon Lukasik) Preprint submitted to Ecological Informatics November 16, 2016

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forecasting Air Pollution Concentrations in Iran, Using a Hybrid Model

The present study aims at developing a forecasting model to predict the next year’s air pollution concentrations in the atmosphere of Iran. In this regard, it proposes the use of ARIMA, SVR, and TSVR, as well as hybrid ARIMA-SVR and ARIMA-TSVR models, which combined the autoregressive part of the autoregressive integrated moving average (ARIMA) model with the support vector regression technique...

متن کامل

The fuzzy logic in air pollution forecasting ‎model

In the paper a model to predict the concentrations of particulate matter PM10, PM2.5, SO2, NO, CO and O3 for a chosen number of hours forward is proposed. The method requires historical data for a large number of points in time, particularly weather forecast data, actual weather data and pollution data. The idea is that by matching forecast data with similar forecast data in the historical data...

متن کامل

Forecasting Ozone Density in Tehran Air Using a Smart Data-Driven Approach

Introduction: As a metropolitan area in Iran, Tehran is exposed to damage from air pollution due to its large population and pollutants from various sources. Accordingly, research on damage induced by air pollution in this city seems necessary. The main purpose of this study was to forecast ozone in the city of Tehran. Considering the hazards of ozone (O3) gas on human health and the environmen...

متن کامل

Forecasting Air Pollution Concentrations in Iran, Using a Hybrid Model

The present study aims at developing a forecasting model to predict the next year’s air pollution concentrations in the atmosphere of Iran. In this regard, it proposes the use of ARIMA, SVR, and TSVR, as well as hybrid ARIMA-SVR and ARIMA-TSVR models, which combined the autoregressive part of the autoregressive integrated moving average (ARIMA) model with the support vector regression technique...

متن کامل

Capabilities of data assimilation in correcting sea surface temperature in the Persian Gulf

Predicting the quality of water and air is a particular challenge for forecasting systems that support them. In order to represent the small-scale phenomena, a high-resolution model needs accurate capture of air and sea circulations, significant for forecasting environmental pollution. Data assimilation is one of the state of the art methods to be used for this purpose. Due to the importance of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Ecological Informatics

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2016